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PROPRIETY  INFORMATION 


1 

Introduction 

Since  16th-22nd  July  1994,  when  21  fragments  of  the  Comet  Shoemaker-Levy  9  smashed  into  the 
jovian  atmosphere  at  speeds  of  up  to  60km/s,  the  destmctive  and  explosive  nature  of  the  impact  has 
stimulated  growing  interest  and  fears  of  a  similar  occurrence  on  Earth.  Impact  outers  around  the  world 
and  the  moon  has  shown  that  the  Earth  too  has  not  escaped  from  asteroid  collisions.  Impacts  (e.g,  the 
1908  Tunguska  incident)  and  near-misses  (e.g.  1,100,000  km  and  170,000  km  of  asteroids  1989FC  and 
1991BA  respectively)  of  astronomical  objects  or  near-Earth  objects  (NEOs)  have,  in  recent  years, 
manifested  an  urgency  for  the  need  of  a  collision  awareness  (and,  possibly  prevention)  programme. 

However,  near-Earth  asteroids  are  not  the  only  cause  of  threat.  For  the  past  few  decades,  man  has  been 
sending  more  and  more  rockets  and  satellites  into  space  for  the  purpose  of  discovery,  communication, 
weather  forecasting,  military  surveillance  etc.  As  time  goes  by,  these  equipment  would  age  into 
functional  disuse  while  continuing  to  orbit  the  Earth  and  in  some  cases  even  gradually  bdng  pulled  back 
by  the  Earth's  gravitational  field.  This  has  resulted  in  a  new  threat,  that  of  felling  space  junk.  An  ISIEO 
detection  survey  would  also  help  saf^uard  against  such  eventualities. 

As  a  first  step  towards  achieving  NEO  detection  and  collision  prevention  goals,  a  comprehensive  survey 
to  compile  a  detailed  inventory  of  the  trajectories  and  plysical  characteristics  of  all  asteroids  in  the 
near-Earth  environments  was  necessary.  Ofcourse,  survey  programmes  of  this  type  also  have  other 
spin-oflfs  of  geological  and  astrophysical  significance.  The  composition  of  NEOs  could  give  vital  clues 
about  the  materials  from  which  our  Universe  was  formed.  The  difference  in  composition  between 
various  types  of  NEOs  would  also  indicate  how  material  had  spread  through  the  Solar  System.  The  orbit 
of  these  objects  would  hint  at  how  the  debris  escaped  from  the  Solar  System.  In  the  future,  NEOs  could 
be  a  potential  mining  resource.  A  few  near-Earth  asteroids  are  easier  to  reach  (in  terms  of  their 
differential  velocity)  than  the  moon.  Other  areas  that  could  benefit  from  the  NEO  detection  program 
include  artifidal  satellite  tracking  and  space-based  surveillance  where  algorithms  and/or  systems 
requirements  are  very  similar. 

NASA  international  workshops  were  organised  in  1992,  to  discuss  the  possible  threat  of  NEOs  and 
suggest  suitable  fiature  scientific  research  and  possible  strategic  defence  measures.  It  was  recommended 
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that  a  concerted  effort  by  the  astronomical  community  through  a  globally  co-ordinated  long-term  search 
programme  with  the  prime  objective  of  detecting  and  cataloguing  these  interplanetary  fugitives.  These 
recommendations  have  been  stimulating  interest  in  Europe,  Indeed,  the  EUNEASO  (European  Near 
Earth  Asteroid  Search  Observatory)  project  recently  established  aims  to  provide  a  local  network  of 
French,  German,  Italian  and  Swedish  observatories,  as  a  first  step  towards  a  European  contribution  to  a 
world  wide  future  network,  as  proposed  in  the  Spaceguard  Survey. 

Wrth  hind-sight  and  valuable  empirical  e\ldence  from  previous  survey  programs  (PCAS,  PACS, 
AANEAS,  Spacewatch  etc.),  astronomers  have  realised  that  besides  the  need  for  larger  CCD  sensor 
arrays  to  be  integrated  into  the  imaging  system,  the  unsuitability  of  current  observatory  fedlities  for  NEO 
detection  has  been  primarily  a  lack  of  suflSdently  high  performance  processing  workhorses,  for  the 
subsequent  image  proces^g.  Computing  performance  rather  than  the  optical  system  have  been  one  of 
the  main  reasons  for  current  detection  systems  in  achieving  a  low  discovery  rate  and  are  hence  proving  to 
be  unsuitable  in  achieving  a  comprehensive  survey  of  the  skies. 

Accordingly,  the  NEO  detection  project  at  Aspex  Maosystems  Ltd.  was  established  to  study  the 
feasibility  of  providing  a  solution  to  the  NEO  problem  through  a  pioneering  Modular-MPC  (Masavely 
Parallel  Computer).  This  study  follows  the  application  requirement  study  for  NEO  detection  conducted 
by  Aspex  MiCTO^stems  Ltd.  [JOS94a]. 
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2 

Application  Analysis 


The  Spaceguard  survey  report  |MOR92]  has  laid  the  foundation  for  most  of  the  NEO  detection 
research  currently  undertaken.  The  report  recommended  a  network  of  6  telescopes,  each  fitted  with 
large  format  CCDs  in  a  maimer  that  utilises  the  focal  plane  efficiently,  surveying  at  least  6000 
sq.deg/month  and  detecting  up  to  30,000  objects/sq.d^. 

The  discovery  rate  is  dependent  on  the  number  of  available  asteroids  at  a  given  limiting  magnitude,  and  is 
proportional  to  the  primary  sky  coverage.  Primaiy  sky  coverage  is  governed  by  the  telescope's  field  of 
view  and  the  overall  efficiency  of  the  detector  or  imaging  medium.  Another  consideration  is  the  limiting 
magnitude  of  the  telescope/detector  assembly.  A  feinter  threshold  fisr  detections  seems  desirable  in  that 
the  search  volume  is  increased  and  fainter  objects  are  sampled,  to  achieve  fainter  thresholds,  larger 
aperture  telescopes  and/or  more  sensitive  detectors  are  required.  A  model  of  a  whole  sky  survey  for 
NEO  detection  [MOR92]  revealed  that  to  maintain  discovery  completeness  and  discovery  rate  without 
compromising  warning  times,  large  telescopes  surv^ing  large  volumes  of  the  sky  at  a  limiting  magnitude 
of  less  than  24  were  necessary.  And  with  even  such  compromises,  survey  periods  of  about  25  years  were 
expected. 

The  detection  of  near-Earth  objects  involves  the  extraction  of  astronomical  objects  fi'om  images  obtained 
fi-om  the  CCD  camera  mounted  on  a  telescope  and  the  recognition  of  these  objects  to  ^ve  a  possible  list 
of  asteroids  that  could  then  be  classified  as  Earth-approachers  or  Earth-crossers.  Asteroids  are  baacaUy 
detected  by  virtue  of  their  constant  velocity  motion  in  space.  Hence,  these  objects  are  best  distinguished 
through  comparisons  of  different  frames  of  the  same  field,  taken  over  a  period  of  time.  From  these 
detections  on  different  frames  thdr  orbital  characteristics  may  be  determined  and  hence  decisions  can  be 
made  on  whether  these  detected  objects  could  be  possible  near-Earth  objects.  Non-stationaiy  objects  and 
hence  possible  asteroids  are  seen  as  dther  streaks  fijund  in  the  images  or  by  the  apparent  motion  of 
objects  between  frames  of  the  same  field.  The  latter  form  of  detection  is  the  most  likely  criterion  that 
would  establish  the  presence  of  NEOs. 
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2. 1  The  NEO  detection  system 

Asteroid  detection  may  be  achieved  with  ^ems  as  shown  in  Fig.  2.1,  comprising  of  6  major 
components;  namely  the  imaging  unit,  the  acquisition  unit,  the  system  controller,  the  image  archives,  the 
detection  en^e  and  the  host.  This  fimdamental  configuration  is  typical  of  most  of  the  current  survey 
telescope  systems.  Examples  of  such  ^sterns  are  already  in  evidence  in  the  Spacewatch  program  at  Kitt 
Peak,  Arizona,  USA  and  the  NEO  detection  program  at  the  Observatoire  de  la  Cote  d'Azur,  Nice, 
France.  Though  the  specifics  might  differ,  their  fimctionalities  and  principle  are  amilar.  Agreeably,  the 
features  of  the  sensor  and  the  data  acquisition  are  pivotal  in  the  overall  requirements  for  the  system, 
hence  justifying  a  larger  allocation  in  the  discussion,  in  this  section^. 


IMAGING 

UNIT 


A  Schmidt  telescope  and  photosensitive  sensors  make  up  the  ima^g  unit.  The  wide  ^erture  Schmidt 
telescope  provide  a  large  field  of  view  as  compared  to  traditional  Newtonian  telescopes,  making  it  ideal 
for  a  survey.  In  the  past,  the  photosensitive  elements  popularly  used  by  astronomers  were  photographic 
elements.  However,  recently  the  use  of  photography  has  faced  a  descendency  to  charge  coupled  devices. 
A  comparison  of  the  detection  characteristics  of  photography  and  CCDs  [JAN87],  in  Table  2.1,  reveal 
that  this  trend  is  not  at  all  unjustifiable.  CCDs  have  a  higher  detective  quantum  efficiency^,  higher 
operational  detective  quantum  efficiency^,  better  spectral  and  dynamic  range,  stability  and  linearity. 

The  CCDs  are  housed  in  the  fiacal  plane  of  the  telescope.  As  ideal  a  detector  as  the  CCD  may  appear  to 
be,  in  the  real  world  it  does  have  some  limitations.  A  major  problem  is  that  CCD  sizes  are  limited  to  the 
largest  wafers  available  and  low  manufecture  yield  for  good  quality  large  format  CCDs.  However,  even 

’  Specific  details  mentioned  about  the  system  are  mentioned  with  reference  to  the  facilities  provided  for  at  OCA 
*  It  provides  a  measurement  of  the  amount  of  photons  actually  detected  by  a  sensor,  given  the  total  amount  of  photons  falling 
*This  gives  the  measurement  of  the  detector  efficiency  with  regard  to  actual  image  capture  and  the  time  after  which  useful  information  has  been 
extracted  and  interpreted 
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the  larger  CCDs  such  as  the  2k  x  2k  Loral,  Kodak  and  Reticon  CCDs,  the  2k  x  4k  Loral  CCD,  the  4k  x 
4k  Ford  CCD,  are  too  small  to  cover  the  focal  plane. 


1 

Phol08«ipt^>' 

High 

Moderate 

-10% 

-80% 

Low 

High 

Visual  spectrum 

X-ray  to  Near  Infrared 

o 

o 

-100000 

Poor 

Good 

Poor 

-0.1% 

5% 

0.5% 

Moderate 

Moderate 

Table  2.1  Comparison  of  photography  and  charge  coupled  devices 


Also,  large  format  CCDs  have  prohibitively  long  read-out  times  and  can  make  it  cost-ineflfective  as  it 
reduces  overall  telescope  observation  time.  However,  through  the  use  of  multiple  channels  for  read-out, 
this  limitation  could  be  ^peased.  Read-out  times  can  also  be  improved  but  at  the  expense  of  more  noise. 
Techniques  do  exist  to  reduce  these  acquiation  times  while  aiming  to  inCTease  signal  to  noise  ratios. 
Most  CCDs  now  incorporate  MPP®  and  Super-MPP  modes  [AST94]  to  reduce  dark  current  noise.  Also 
the  use  of  dual-stage  output  amplifiers  can  increase  read-out  rates  significantly  e.g.  for  the  EG  &  G 
Reticon  CCD  RA2000JAU-22X  [\VIN94]  an  optional  two  stage  source  follower  increases  data  rate 
fi'om  a  maximum  of  2  Megpixels/sec  to  15  M^apixels/sec.  Another  method  is  to  exploit  spectral 
characteristics  of  the  scene  to  remove  background  intensity.  Since  a  large  portion  of  the  light  reflected  by 
an  asteroid  is  red,  short  wavelength  cut-oflF filters  could  be  used  to  filter  out  background  intensity. 

Astronomers  have  thus  realised  that  to  achieve  effective  utilisation  of  the  imaging  area  in  the  focal  plane, 
the  solution  is  to  create  a  large  active  CCD  sensor  area  fi'om  a  mosaic  of  smaller  CCDs  e.g.  30  CCDs  in 
the  photometric  telescope  used  for  the  Sloane  Digital  Sky  Survey  [KEN92].  Previously  CCD  chips  were 
fabricated  and  packaged  such  that  there  would  exist  some  dead  space  bordering  the  active  pixel  array. 
Thus,  only  an  N  x  M  array  with  considerable  dead  space  in-between  or  a  chequered  array  [MAU93a],  as 
shown  in  Fig.  2.2,  were  possible.  But  with  the  advent  of  edge-buttable  devices,  1  x  N  and  2  x 
N[SEK92][LUP94][GEA91][ST094]  configurations  have  become  possible,  with  minimal  dead  space 
between  the  CCDs.  Thus  maximising  the  use  of  available  ima^g  area  in  the  focal  plane. 

Qven  the  need  to  compare  images  of  the  same  field,  three  schemes  may  be  employed  in  the  CCD 
arrangement  using  either  a  2  x  N  chequered  or  1  x  N  configurations.  Ideally,  3  columns  of  CCD  airays 


®MPP :  Multi-Pinned  Phase 
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should  be  used  to  be  able  to  detect  moving  objects  and  ascertain  whether  that  motion  is  of  a  constant 
velocity.  However,  such  a  column  configuration  would  limit  the  detection  to  objects  that  move 
suflSdently  fest  to  show  differences  in  individual  column  images.  Another  method  is  to  use  two  columns 
and  for  the  third  comparison  a  standard  database  is  utilised  of  the  astronomical  objects  in  that  area  of  the 
sky.  A  single  column  of  CCDs,  scanning  over  a  period  of  time  before  repeating  observations  of  the  same 
field  can  also  be  used.  Two  or  three  such  observations  would  be  necessary  for  detection.  The  advantages 
of  the  three  and  two  column  configurations  is  that  they  allow  a  larger  survey  area  to  be  covered  per  day. 
However,  the  need  of  database  access  during  the  detection  process  may  prove  to  be  an  added  overhead 
e.g.  disk  accesses,  added  functionality  in  either  the  software  or  hardware.  Given  the  sensitivity  of 
observations  to  weather  and  "seeing"  conditions,  the  ability  to  cover  a  larger  area  per  day  appears  an 
attractive  option,  though  at  present  this  may  be  a  little  expensive. 


Fig  2.2.  CCD  configurations 

The  CCD  can  be  operated  in  2  baric  modes;  the  stare  and  the  sidereal  scan.  The  stare  mode  is  the 
simplest  mode  of  operation  for  the  CCD.  Here  the  CCD  is  exposed  to  an  area  of  the  sky  for  a  ^ven 
period  of  time  dependent  on  the  type  and  limiting  magnitude  of  the  object  observed.  The  CCD  camera 
shutter  is  then  shut  and  data  is  read-out  viiile  the  telescope  is  repositioned  for  the  next  exposure.  All  the 
above  configurations  maybe  utilised  in  this  mode.  The  only  difference  is  that  individual  CCD  frame 
registration  complexity  would  differ 
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In  the  sidereal  (or  drift  scan  or  time-delay  integration)  mode,  the  rotation  of  the  earth  is  utilised  instead  of 
repositioning  the  telescope.  By  clocking  the  CCD  appropriately,  charge  packets  collected  in  each  pixel 
are  transferred  such  that  th^  move  in  unison  with  astronomical  objects  above  them.  This  scan  can  be 
achieved  without  having  to  close  the  shutter,  hence  effectively  making  read-out  time  latent.  However, 
there  exist  a  directional  aspect  in  this  mode,  read-out  is  only  possible  on  the  side  of  the  CCD  i.e.  in  the 
direction  of  scan.  Thus  only  the  chequered  and  the  1  x  N  configurations  are  useful.  In  this  mode,  the  scan 
technique  automatically  removes  any  loss  of  image  within  the  dead  space  for  the  chequered  mode,  unlike 
the  1 X  N  array.  Also  a  larger  observable  area  is  possible. 

However,  the  electronics  of  the  acquisition  unit  functions  much  faster  than  the  sidereal  rate  and  hence  are 
forced  into  wait  states  after  a  serial  register’  transfer.  These  wait  states  can  be  reduced  by  effectively 
increasing  the  adereal  rate.  By  moving  the  telescope  in  the  direction  of  the  sidereal  direction,  effective 
sidereal  rate  is  inaeased.  However,  the  price  to  pay  is  that  read-out  noise  inaeases  to  levels  that  are 
sometimes  unacceptable  for  low-level  light  scenes  typical  of  astronomy.  But  this  method  appears  to  be 
the  most  suitable  for  a  surv^. 

The  acquisition  process  of  data  from  the  CCD  to  the  detection  engbe  is  a  complicated  afl&ir.  The  CCD 
requires  multi-phase  clocking  to  effectuate  the  transfer  of  charges,  though  with  virtual  phase  clocking,  in 
principle,  a  ^gje  phase  clock  would  suffice.  Timing  and  wave  shaping  drcuits  are  necessary  for  these 
operation.  These  circuits  are  collectively  termed  as  the  CCD  port  as  it  provides  the  interfece  between  the 
CCD  and  the  rest  of  the  system.  The  CCD  port  is  also  responsible  for  the  analogue  to  distal  conversion 
of  the  data.  The  CCD  controller  ^chronises  and  governs  the  whole  acquisition  process.  Several  CCD 
controllers  are  now  in  existence  ranging  from  discrete  circuit  assemblies  [GUN87]  at  the  Palomar 
Observatory,  USA,  DSP  based  [LEA88]  at  the  Steward  Observatory,  USA  and  at  O.C.A,  France 
[MAU93b],  transputer-based  [WAL90]  at  the  Greenwich  Observatory,  UK,  programmable  logic 
devices  [HAN94]  and  even  a  combination  of  a  transputer  and  a  DSP  [REI94]  at  the  European  Southern 
Observatory.  However,  their  functionality  is  the  same;  providing  the  clocking  sequences  and  voltage 
biases,  but  with  varying  programmability  and  flexibility.  When  a  controller  services  more  than  one  CCD, 
it  should  ensure  the  integrity  of  the  data.  It  should  also  buffer  the  data  before  it  is  can  be  utilised  by  the 
detection  engine.  As  in  any  acquisition  system,  a  data  back-up  or  archive  fecility  is  also  provided.  Due  to 
the  large  data  sizes,  of  the  order  of  several  Megapbcels/sec,  as  shown  in  Table  2. 1  comparing  two  CCDs, 


’  The  serial  register  is  the  read-out  register  of  the  CCD  array. 
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associated  with  the  survey  it  may  be  necessaiy  to  include  another  fiinctional  block  for  the  compression  of 
this  data  before  storage. 

With  a  large  mosaic  CCD  imaging  system,  necessaiy  for  ISEO  detection,  the  acquisition  control  and 
^nchronisation  would  overwhelm  an  single  processor.  Thus,  su^esting  a  need  for  a  hierarchical 
multi-controller  schematic.  Such  a  schematic  would  comprise  of  an  individual  CCD  chip  or  a  sub-array 
of  CCD  chips  serviced  by  a  local  acquisition  unit  and  controller,  and  a  master  controller  then  servidng  a 
group  of  these  units.  This  arrangement  would  imply  the  need  for  multi-channel  controllers  as  seen  in  Fig. 
2.3.  With  this  scheme,  the  local  controllers  would  be  primarily  responsible  for  controlling  the  acquisition 
of  CCD  data  from  the  sub-array  and  also  handle  the  temporary  storage  of  individual  frames  in  thdr 
individual  local  buffers.  It  would  also  handle  the  archiving  of  these  images  on  secondary  storage  systems. 
The  synchronisation  of  these  local  sub-array  controllers  and  hence  the  acquisition  of  the  entire  array  of 
the  sensor  would  be  managed  by  the  array  controller.  The  array  controller  would  have  to  maintain  the 
integrity  of  the  overall  composite  image  and  allow  access  to  the  data  by  the  detection  engine. 


Table  2. 1  Data  requirements  for  a  telescope  with  a  30  cm  focal  plane 

The  detection  engine  now  has  the  ominous  task  of  reducing  this  massive  amounts  of  data  so  that 
astronomers  may  be  able  to  pin-point  posable  near-Earth  asteroids.  Given  the  large  data  volumes 
expected,  the  rapid  simultaneous  read-out  from  each  CCD  in  the  mosaic  and  the  computation  intense 
algorithms  typical  of  image  processing  applications,  massive  computing  capablties  are  required.  The 
inherent  parallelism  in  image  proces^g  make  it  suitable  for  massively  parallel  processing.  In  the 
application  requirements  [JOS94a]  it  was  shown  that  gven  the  masave  data  sizes,  typical  in  astronomical 
imaging,  even  a  simple  convolution  would  require  several  Giga-OPS  (OPS  =  16  bit  additions  per  sec). 
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With  more  advanced  sensors  available  for  data  sensing,  providing  higher  sensitivity  and  data  rates  of  up 
to  a  few  hundred  meg^ixels  per  sensor,  these  requirements  can  indeed  reach  Tera-OPS. 

Besides  the  requirements  on  processing  performance,  other  important  constraints  also  exists.  As  larger 
telescopes  are  built,  larger  and  more  advanced  mosaics  would  be  fitted  in  the  focal  plane.  Processing 
requirements  would  increase  by  a  few  orders.  Hence,  the  masavely  parallel  processor  should  be  scaleable 
to  accommodate  possible  expansions  of  survey  coverage  and  to  meet  the  inaease  in  processing 
performance  requirements.  Also  as  astronomers  persevere  to  develop  better  and  more  complex 
algorithms  to  extract  more  precise  information,  the  MPC  should  remain  flexible  and  programmable.  All 
these  requirements  suggest  that  the  Modular-  MPC  approach  can  provide  the  necessary  performance, 
flexibility  and  overall  end-user  acceptability. 

A  host  facility  in  fig  2.3,  gives  the  necessary  interfece  to  astronomers  in  the  control  of  the  entire  NEO 
detection  operation  and  hence,  should  be  at  the  top  end  of  such  a  system.  Also  through  such  a  host, 
information  regarding  detected  NEOs  can  be  communicated  to  the  astronomical  community  via  existing 
global  networks  like  the  Internet.  Such  a  facility  would  be  essential  when  follow-up  tasks  are  essential  in 
confirming  dubious  objects  by  other  observatories  around  the  worid. 


Fig  2.3  The  NEO  detection  system 
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2.2  NEO  detection 

The  NEO  detection  task  sequence  can  be  seen  to  be  comprised  of  a  task  sequence  as  illustrated  in  Fig. 
2.4.  Image  acquiation  was  referred  to  in  the  above  section,  and  has  been  added  in  the  task  sequence  to 
provide  a  more  complete  representation  for  the  task  sequence.  The  loop-back  in  the  figure,  eluddates  the 
need  for  the  repetition  of  image  conditioning,  object  enhancement,  s^mentation,  and  analysis  for 
repeated  fi-ames  of  the  same  field,  during  the  scan  sequence,  in  order  to  classify  objects  as  NEOs.  This  is 
only  the  case  when  a  angle  column  of  CCDs  is  used.  For  two  or  three  column  configurations  such  a 
loop-back  would  only  be  necessary  when  the  field-of-interest  for  observation  has  changed.  The 
Spacewatch  program,  distinguishes  2  stages  only:  streak  detection  and  motion  detection.  Streak 
detection  includes  image  conditioning  to  analysis  tasks  while  object  detection  is  achieved  in  the  detection 
task. 

2.2.t  Image  Conditioning 

Image  conditioning  can  be  seen  as  a  task  necessary  to  remove  characteristics  of  the  optical  and  senang 
system  that  has  biased  the  process  of  image  capture.  Algorithms  include  CCD  reduction 
[VAL88]|MAS89]  and  image  restorative  algorithms. 


Fig  2. 4.  The  NEO  detection  task  sequence 

CCD  related  noise  are  additive  and/or  multiplicative.  Standard  CCD  reduction  operations  include : 

i.  Row/  Column  interpolation  :  The  replacement  of  bad  columns  and  lines  due  to  manufacturing 
defects  in  the  CCD  chips.  The  interpolation  fi'om  neighbouring  columns  and  rows  is  usually  used 
to  solve  this. 

ii.  Flat  fielding  :  This  removes  individual  pkel-to-pixel  sensitivities.  Flat  field  ©qjosures  may  be 
obtained  by  exposing  CCDs  to  an  illuminated  white  spot  in  the  dome  or  for  the  purpose  of  good 
photometry  this  images  taken  at  twilight  are  more  ^propiiate.  However,  care  should  be  taken  as 
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illumination  corrections  may  be  necessary  especially  with  the  former.  Once  the  flat  field  has  been 
obtained,  it  can  be  used  for  all  observations  made  on  the  day. 

ii.  Dark  fielding;  Some  amount  of  dark  current  will  always  exist.  This  provides  a  correction  for 
hole-electron  pairs  that  are  thermally  generated.  Dark  fielding  involves  the  subtraction  of  a  bias 
level  fi'om  the  raw  data.  The  dark  fielding  may  be  achieved  by  two  methods;  by  utilising  data  fi-om 
the  overscan  or  prescan  columns  or  lines,  or  by  obtaining  a  dark  image  with  the  CCD  shutter 
and/or  telescope  dome  closed.  Both  methcxls  provide  certain  compromises.  The  former  method 
provides  an  approximation  as  it  involves  a  calibration  with  data  acquired  simultaneously  with  the 
rest  of  the  image.  The  latter  is  achieved  using  a  dark  field  image  acquired  before  the  survey  and  is 
representative  of  localised  dark  current  generation  at  the  time  the  daik  field  image  was  taken. 

Other  operations  include  subtraction  of  a  zero  level  using  a  zero  length  ©cposure  calibration  image  to 
remove  preflash-induced  non-zero  initial  counts,  division  by  an  illumination  image  to  correct  lumination 
errors  while  obtaining  the  correcting  field  images. 

2.2.2  Image  Enhancement 

Most  image  processing  solutions  are  based  on  an  approach  whereby  an  image  model  is  chosen  to  best 
represent  the  characteristics  of  the  object  of  interest  to  aid  in  its  detection  or  recognition.  Most  objects  in 
an  astronomical  image  are  only  a  few  pkels  and  are  characterised  by  a  gausaan-like  intensity  profile 
rising  above  the  average  background  intensity.  Consequently,  astronomical  models  have  been  based  on 
these  characteristics. 

The  classical  astronomical  model  is  based  on  an  image,  I(x,y),  that  is  constituted  by  a  slowly  variable 
background  with  superin^sed  small  scale  astronomical  objects  which  are  assodated  to  a  point  of 
maximum  intenaty.  Then, 

I{x,y)  =  ap{x,y)  +  p  (Eq.  4.1) 

vdiere  a  is  the  amplitude  of  the  object.  p(x,y)  is  the  profile  of  the  model  object  and  p  is  the  mean  of  the 
background  variation 

Astronomical  sources  are  defined  by  their  radial  profiles  and  are  characterised  by  a  circular,  bivariate 
gaussian  profile.  Hence, 

-(Ax^+A>^) 

p(Ax,  A>/)  =  e  2o2  (Eq.4.2) 

where  (Ax,  A;^)  is  the  distance  fi'om  the  centroid  of  the  object,  and  a  is  related  to  the  aze  of  the 
astronomical  object  taken  as  the  FWHM  )  over  the  range  -3o  <  A?^Ay  <  3a 
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This  is  derived  from  the  consideration  that  stars  and  other  astronomical  sources  are  point  sources  or 
have  a  dirac  function.  Most  object  detection  projects,  e.g.  DAOPHOT*,  NEO  Project  at  O.C.A,  SDSS®, 
have  adopted  this  model. 

Since  asteroids  and  star-like  objects  are  in  the  high  frequency  range,  enhancement  may  be  achieved  by 
using  high  pass  filters  vririch  enhances  the  resolution  features  and  suppresses  long  wave  fluctuations. 
However,  the  highest  frequendes  are  also  the  most  affected  by  noise.  Therefore,  these  filters  also 
enhance  these  noise  components. 

The  filter  for  an  astronomical  image  would  then  have  to  be  a  bandpass  filter,  p'(x,yj,  such  that  the 
restoration  would  provide  the  amplitude,  a  or  a  value  as  close  to  it  as  possible.  Thus, 

Z'Lp'{x,y)I{x,y)  =  a  (Eq.  4.3) 

At  O.C.A,  the  value  o^p'(x,y)  is  determined  by  the  method  of  least  squares  using  a  gaussian  model  for 
p(x,y)  to  give  a  matched  filter  (Eq.  4.4)  in  the  sh^  of  a  "Mexican  Hat"  (see  Fig  2.5)[BU92],  while 
DAOPHOT  uses  a  truncated  gaussian  function  such  that  its  integral  gjves  zero. 

Ax,y)=p{x,y){?^^!^)  (Eq.4.4) 

whaeS'o  =  SZ;7(x,>')  ,S\='L'Lp\x,y)  ,S2  =  'L'Lp^{x,y)  and A  =  5'2S'o-5'j 

Since  the  gaussian  model  (p(x,3^)  is  dependent  on  object  aze,  taken  to  represent  "seeing"  conditions, 
these  filters  must  be  adapted  to  each  exposure,  based  on  an  estimated  full  width  at  half  magnitude 
(FWEJM). 

Detections  due  to  non-astronomical  objects  such  as  cosmic  ray  events,  should 
be  removed  at  this  stage  also.  The  median  filter  may  be  used  for  this. 


Fig  2.5  The  ''Mexican  HaT* filter 


’DAOPHOT  :  Dominion  Astrophysical  Observatory  program  for  crowded  field  stellar  PHOTometry 
’SDSS  :  Sloane  Digital  Sky  Survey 
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2.2.3  Segmentation 

The  task  of  segmentation  involves  the  separation  of  an  input  image  into  its  constituent  parts  or  objects 
such  that  objects  of  interest  can  be  labelled  and  differentiated  from  other  objects  and  from  background 
pixels.  This  is  very  much  dependent  on  the  characteristics  of  the  image,  hence,  the  need  for  the  previous 
steps  to  enhance  these  characteristics.  Objects  need  to  be  separated  as  individual  objects  before  they  can 
be  analysed  to  be  able  to  make  a  deciaon.  Ideally,  pixels  belonging  to  the  same  object  have  to  be 
connected  to  form  a  single  object  definition. 

Segmentation  can  be  achieved  sufficiently  by  thresholding.  The  threshold  value  can  be  determined  by 
considering  the  fluctuations  in  amplitude  due  to  the  filtering  operation.  Since,  the  amplitude  after  filtering 
is  given  by  Eq.  4.3.  Then 

<5\a)  {I(x,y))p'{x,y) 

G\d)  =  \i'L'Lp'{x,y)  =  \xI 

if  a  Poisson  distribution  of  noise  is  considered.  For  effective  detection,  the  threshold  should  then  be  set  at 
^proximately, 

a  >  ko{a) 


\vithk3.  Hence, 

a>3^ 

Values  of  k  may  be  taken  dependent  on  the  required  error  margins.  Ffigher  values  of  k  may  result  in 
detection  Mures  (Mure  to  detect  Mit  objects  that  could  be  classed  as  NEOs)  while  lower  values  may 
result  in  detection  errors  (detection  of  false  objects). 


After  filtering  for  stars  or  star-like  objects,  the  enhanced  central  points  are  equivalent  to  the  points  of 
maxima  on  each  object.  All  determined  points  of  maxima  in  the  full  image  can  then  be  considered  and 
labelled  as  a  possible  asteroid.  These  points  of  maxima  should  be  determined  as  accurately  as  possible,  to 
about  1/1 00th  of  a  pixel,  since  even  minute  errors  could  introduce  significant  errors  during  astronomical 
co-ordinate  reduction  and  on  follow-up  operations.  Other  characteristics  are  also  used.  DAOPHOT  uses 
a  sharpness  and  roundness  criteria  while  FOCAS  uses  a  count  for  the  number  of  pixels  within  the  star. 
These  methods  can  be  useful  in  reducing  non-stellar  detections  including  cosmic  ray  elements  and  many 
cosmetic  flaws  which  tend  to  be  either  narrower  than  a  sedng-broadened  stellar  image.  Roundness  may 
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also  isolate  extended  objects  due  to  overflow  columns  or  rows  from  grossly  over-exposed  objects  in  the 
frame. 

Fast  moving  asteroids  may  be  detected  as  streaks  thus,  operators  like  the  Hough  transform  could  then  be 
utilised  for  the  analysis. 

2.2.4  Analysis 

At  this  point,  processing  now  concentrates  on  objects  rather  than  pixels.  The  task  of  analysis  provides  a 
proper  representation  and  description  of  the  detected  objects  for  subsequent  proces^g.  Parameters  of 
interest  in  the  field  of  astronomy  fi)r  the  description  of  a  objects  are  those  related  to  object  intensity 
magnitude,  object  intensity  profile  characteristics  and  object  location. 

Object  intenaty  magnitudes  can  be  represented  in  conjunction  with  its  intensity  profile  by  determining  the 
second  order  geometric  moments  about  the  centre  (  given  by  the  point  of  maximum  at  x^y^ )  of  the 
detected  object.  The  geometric  moments  of  order  (p+q)  of  fi;x,y)  are  defined  as 

Mp.q  =  'L'Iix-Xpy(y-yc)‘‘I{x,y) 

where  x  and  y  are  the  co-ordinates  of  the  object's  pixels  in  the  image. 

The  object  profile  may  be  represented  by  determining  the  m^or  and  minor  axes  of  the  object's  intensity 
profile.  This  may  be  determined  by  solving  for  the  following  equation. 


nAB=  J  16x^y^ 

wiiere  x  and  y  are  the  moment  centroids  of  order  2  in  x  and  y 
x^  =  "Lx  I,y{x-xyi(x,y) 

=  ZxI-y(y-yyi(x,y) 

A  simpler  method  to  determine  the  centroid  use  an  equation  of  the  form  ax^+bx+c,  to  locate  the  points 
of  maximum  intenaty  and  is  given  by  a  function  of  (a,b,c).  Other  intensity  profile  descriptors  include  ; 

other  and  higher-order  moments,  the  aspect  ratio  :  the  ratio  of  the  objects  major  axis  length  to  the  minor 
axis  length,  asymmetry  :  the  rms  fiactional  change  in  an  object's  intensity  profile  when  reflected  throu^ 
its  minor  axis,  uniformity  :  rms  deviation  of  an  object's  intensity  profile,  along  its  major  axis,  from  a 
template  object  of  the  same  size  and  the  same  mean  intensity,  goodness  to  fit :  the  rms  deviation  of  the 
object's  intensity  profile  from  a  template  object  of  the  same  orientation  and  peak  intensity. 
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normalised  to  the  peak  intensity  of  the  object.  Most  of  these  parameters  have  been  used  by  the 
Spacewatch  program,  FOCAS,  and  by  the  NEO  project  at  O.C.  A. . 

Additionally,  DAOPHOT  and  FOCAS  uses  synthetic  aperture  photometry  also  to  remove  background 
intensity  before  thresholding.  Two  annular  regions  are  taken  around  a  detected  object.  The  first  synthetic 
aperture,  equivalent  to  the  FWHM  of  the  object,  measures  the  total  brightness  dues  to  that  stellar  object. 
The  other  is  taken  several  times  larger  than  the  fiirst  and  measures  the  background  intensity  as  the  mode 
of  that  region.  This  background  intensity  after  appropriate  multiplication  by  the  total  number  of  pixels 
with  in  first  region,  is  then  subtracted  from  the  total  brightness  to  correct  for  all  other  sources  other  than 
the  stellar  object. 

2.2.5  Decision 

Dedaons  resulting  from  recognition  of  possible  NEOs  are  achieved  by  the  comparison  of  images  from 
the  same  area  of  sl^  to  extract  objects  that  could  be  an  asteroid.  Asteroids  can  be  differentiated  by  their 
motion  at  almost  constant  velocity  through  space.  Hence,  at  this  stage  a  fiirther  classification  of  detected 
objects  can  be  performed  by  marking  static  and  non-static  objects  which  can  then  be  fiirther  classified  as 
asteroids  or  dubious  objects. 

Before  classification  can  be  undertaken,  errors  due  to  telescope  pointing  errors  and  atmospheric 
refraction  will  not  cover  predsely  the  same  r^on  of  sky,  and  hence,  would  have  to  compensated.  A 
stationary  object  will  not  appear  at  predsely  the  same  location  in  an  image  due  to  these  errors.  However, 
the  change  in  position  is  the  same  for  each  stationary  object  in  the  same  scan.  Registration  can  be 
achieved  by  determining  the  offset  for  a  known  stationary  object  in  each  frame.  By  generating  a  list  of 
distances  from  the  1®“  and  2"^  object  lists  from  which  a  median  can  be  obtained. 

Classification  may  be  achieved  dther  by  the  use  of  a  database  of  ^bolic  data  referring  to  properties  of 
detected  objeds  or  by  an  image  comparison.  The  use  of  databases  offer  a  considerable  advantage  in  that 
it  offers  a  reduction  of  processed  data  from  numerical  to  symbolic  data.  However,  with  image 
comparison  methods  redundant  information  present  in  object-less  r^ons  result  in  unnecessary 
overheads  in  processing. 

Whichever  the  scheme,  the  basic  recognition  operations  are  similar.  The  main  objective  of  the  algorithm 
is  to  determine  motion  in  objects  and  then  verify  if  this  motion  is  achieved  at  constant  velodty.  This  is 
achieved  by  a  comparison  of  two  to  three  separate  frames  of  tiie  same  field,  taken  after  ^ven  a  period  of 
time,  to  locate  objects  that  are  present  in  all  three  and  are  steadily  moving  across  the  field. 
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Firstly,  lists  of  detected  objects  in  frames  are  created.  Objects  that  are  not  present  at  similar  locations  in 
either  of  the  two  lists  of  detected  objects  are  classifred  as  non-static  objects.  The  search  area  to  be 
considered  is  made  on  the  premise  that  since  moving  asteroids  are  assumed  to  have  a  constant  velodty, 
the  ohatiPP!  in  co-ordinates  should  be  more  or  less  a  constant  and  the  search  area  should  increase  linearly 
fr’om  the  second  the  third  frame.  The  window  within  which  this  search  is  made  is  dependent  on  the 
maximum  asteroid  speed  expected  and  the  time  lapse  between  exposures.  On  comparison  with  the  list 
generated  fr'om  a  third  image  of  the  same  area,  differentiation  of  possible  asteroids  fr'om  other  dubious 
objects  can  be  made.  Here  the  search  is  within  a  smaller  window  as  the  expected  region  is  now  known 
after  the  frrst  comparison.  When  only  two  scans  of  the  same  field  are  obtained  a  standard  star  catalogue'® 
may  be  used  to  isolate  non-stationary  objects.  The  second  frame  can  then  be  used  to  determine  whether 
the  object  was  moving  at  constant  velodty. 

The  orbital  dements  of  these  recognised  asteroids  are  calculated  and  then  used  to  further  clas^  them 
into  spedfic  classes  of  asteroids  e.g.  Amor,  Aten,  or  ./^ollo  etc. 


^®Star  catalogue  :  catalogued  collection  of  all  known  stars 
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3 

Modular-MPC  architecture 

An  Assodative  String  Processor(ASP)  based  modular  massively  parallel  computer  (Modular-MPC)  is 
currently  bdng  studied  as  an  architecture  that  is  suitable  to  meet  the  requirements  of  NEO  detection.  The 
ASP  architecture[LEA88]  has  proven  its  cost-eflfectiveness  and  performance-beating  support  of  image 
processing  applications  [KRI91]  while  exploiting  state-of-the-art  miCToelectronics  technology.  ASP  exploits 
the  opportunities  presented  by  the  latest  advances  in  the  VLSI-to-ULSI-to-WSI  technology  trend.  It  also 
makes  use  of  the  continually  improving  high  density  system  assembly  techniques.  ASP  remains  independent 
of  technology,  so  it  can  benefit  fi'om  the  inevitable  improvement  in  microelectronics  technology  without 
architectural  modificatioa 

3.1  Modular-MPC  concept 

Image  processing  applications  are  inherently  data  parallel  and  are  well  suited  to  massively  parallel  processing 
processing.  However,  despite  the  promising  potential  of  current  MPCs,  architectural  limitations  have  led  to 
constraints  in  the  integration  of  image  processing  applications,  as  MPCs  have  lacked  the  flexibility  to  match 
task  program  complexity  and  data  evolution.  This  has  led  to  the  development  of  second  generation 
architectures  that  utilise  SIMD  and/or  MIMSIMD  configurations  for  task  pipelining  and/or  task  parallel 
application  solutions  through  multiple  task  modules. 

The  Modular-MPC  strategy  has  been  to  develop  an  architecture  that  provides 

i.  application  flexibility 

Machine  versatility  as  well  as  performance  scalability  achieved  through  application-specific 
configurations  of  generic  hardware  modules  and  soflware  modules  in  order  to  match  the  natural 
parallelism  of  the  application. 

User  acceptability  gained  by  providing  many  femiliar  user-environments  which  adapt  to  the  specific 
needs  of  dififerent  users. 

ii.  cost-efifectiveness 
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Size,  weight,  power  and  cost  are,  compared  to  current  MFCs,  significantly  reduced  through  the 
consequent  use  of  microelectronics.  Application  spedfic  configurations  are  configured  from  generic 
modules,  which  can  be  mass-produced  and  are  based  on  mass-produced  microelectronics  components 
(i.e.  RAMs,  microprocessors,  FPGAs  and  ASP  modules) 

Operational  eflSciency  is  secured  through  Modular-MPC  fijnctionality  and  the  efficiency  of  ASP 
modules 

Future  proofing  of  the  Modular-MPC  is  "inherited"  from  the  technology  road-map  fijr  high-volume 
off-the-shelf  components  (e.g.  memories,  microprocessors,  FPGAs)  and  ASP  module  technology 
upgrades. 


Requirement 

analysis 


3.2  Modular-MPC  methodolo2V 

Most  image  processing  applications  require  the  execution  of  a  sequence  of  task  packages.  Tasks  control  the 

levolution  of  specified  sub-images  and,  typically,  are  application-spedfic  in 
nature;  the  more  complex  usually  comprising  a  hierarchy  of  composition-tasks 
and,  at  the  lowest  level  base-tasks.  Base-tasks  are  executed  as  a  sequence  of 
general-purpose  processes,  which  control  the  navigation  and  evolution  of 
spedfied  data  stmctures,  as  indicated  below  for  typical  process 
examplesi/lrocessei  are  executed  as  a  sequence  of  primitive  operations  (e  g.  +, 
-,  X,  and,  or,  <  =,  >  etc.)  which  control  the  evolution  of  pixels. 

The  Modular-MPC  methodology  is  based  on  the  following  steps,  shown  in 
Figure  3. 1,  carried  out  for  the  particular  requirements  (e.g.  frame  size  and  frame 
rate)  of  an  ^plication. 


Analysis  of 
natural  parallelism 

_ 

Synthesis  of 
applied  parallelism 

_ 

Cost /performance 
optimisation 

Prototype 

Demonstration 


Fig  3.1 
Methodology 


Modular-MPC 


Analysis  of  the  natural  parallelism  Task  packages,  tasks  and  processes  are 
identified.  A  flow-graph  e:q)Oses  opportunities  for  control-level  parallelism. 
Subsequently  sub-images  assodated  with  task  packages  and  tasks  as  well 
as  data-structures  associated  with  processes  are  identified.  The  size  of  the 


sub-images  and  data-structures  exposes  opportunities  for  data-level  parallelism. 
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Syntheas  of  the  applied  parallelism  Modular-MPC  software  modules  are  installed  in  order  to  match  the 
identified  tasks  and  processes.  The  algorithm  is  then  fimctionally  verified  on  a  general  Modular-MPC. 
Subsequently  a  configuration  of  hardware  modules  for  a  specific  Modular-MPC  which  matches  the 
natural  parallelism  is  derived  and  optimised. 

1.?  Hardware  architecture 

Figure  3.2  shows  the  high-level  architecture  of  the  Modular-MPC.  It  is  partitioned  into  three  main  functional 
blocks: 

Massively  Parallel  Processor  (MPP),  where  parallel  processing  takes  place 

Data  Stream  Manager  (DSM),  which  supports  parallel  data  transfer 

Instmction  Stream  Manager  (ISM),  which  supports  sequential  data  transfer  and  control 


Fig  3.2  High  level  architecture  of  the  Modular-MPC 


It  can  be  observed  that  the  natural  parallelism  of  image  processing  applications  is  often  characterised  by  a 
massive  data-level  parallelism  (e.g.  all  pixels  in  a  sub-image  can  be  processed  in  parallel)  and  a  modest 
control-level  parallelism  between  task-packages  or  tcdis.  The  common  MPC  architectures,  MIMD  and 
SIMD,  exploit  either  control-level  or  data-level  parallelism,  but  cannot  exploit  both  forms  of  parallelism 
simultaneously,  make  use  of  the  respectively  other  level  of  parallelism.  Thaefore,  Multiple  Instruction 
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control  of  Multiple  SIMD  (MMSIMD),  which  can  exploit  massive  data-level  parallelism  as  well  as  the 
modest  control-level  parallelism  has  been  employed  for  the  Modular-MPC. 

A  SIMD  ensemble  of  PEs  implement  a  Task  Execution  Unit.  Hence,  the  MIMSIMD  configuration  is 
implemented  by  a  number  of  Task  Execution  Units  (TEU),  each  executing  task  packages.  Depending  on  the 
control-level  parallelism,  the  hierarchy  of  tasks  in  task-packages  can  dther  be  executed  sequentially  in  one 
Task  Execution  Unit  or  be  spread  over  several  TEUs.  In  the  latter  case,  modest  control-level  parallelism  is 
exploited  by  a  number  of  TEUs  working  in  parallel.  However,  the  actual  number  of  TEUs  depends  on  a 
cost-effective  compromise  between  temporal  and  spatial  parallelism  to  achieve  the  minimum  cost  for  given 
performance  requirements.  Finding  this  compromise  requires  balancing. 

In  order  to  exploit  the  control-level  parallelism,  a  multi-task  controller  generates  a  private  control  stream  for 
each  Task  Execution  Unit  (TEU).  Task  Execution  Units  are  connected  via  two  routers.  Parallel  Data  can  be 
exchange  via  the  Inter-Task  Parallel  Data  Router  (TIPDRO,  sequential  data  is  exchanged  via  the  Inter-Task 
Sequential  Data  Router  (TTSDR). 

Finally,  TEUs  can  communicate  with  the  outside  via  a  number  of  interfaces.  While  sequential  data  can  only 
be  exchanged  with  the  outside  via  the  host  interface  which  connects  the  Modular-MPC  with  the  host 
workstation,  a  choice  of  interfaces  is  available  for  parallel  data  I/O; 

host  interface  :  In  cases  of  moderate  performance  requirements  for  parallel  data  I/O,  the  transfer  can  be 
handled  by  the  host  workstation  via  the  host-interface. 

external  I/O  interface  :  For  high-speed  parallel  data  FO  and  for  spedfic  data  transfer  protocols  (e.g. 
HiPPI,  SCSI)  a  number  of  external  FO  interfeces  is  provided. 

monitor  interface  ;  The  spedfic  requirements  of  parallel  data  output  to  high-resolution  displays  is 
handled  by  a  monitor  interfece,  which  provides  all  functionality  necessary  (e.g.  frame  store,  D/A 
converter)  to  cfirectly  interfece  to  a  monitor. 

While  the  control-level  parallelism  is  exploited  by  several  Task  Execution  Units  (TEUs)  working  in  parallel, 
the  data-level  parallelism  is  exploited  within  each  TEU,  each  of  vriiich  implements  a  SIMD  structure.  The 
remainder  of  this  section  is  devoted  to  the  introduction  of  a  TEU  and  its  functional  blocks. 
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3.3.1  Task  Execution  Unit  TTEU)  overview 

The  TEU  is  based  on  ASP  modules.  Figure  3.3  shows  a  high-level  view  of  the  three  main  parts  of  a  Task 
Execution  Unit  (TEU).  The  Massively  Parallel  Processor  implements  a  Parallel  Process  Execution  Unit 
(PPEU)  based  on  the  Associative  String  Processor  (ASP).  The  ASP  has  been  spedfically  deagned  for  and 
has  successfully  demonstrated 

machine  versatility,  which  is  inherent  in  its  architecture 

perfonmnce  scalability  due  to  the  infinite  scalability  of  the  ASP  string 

reduction  in  size,  weight,  power  and  cost  due  to  the  feet  that  the  string  topology  of  the  ASP  architecture 
has  been  specifically  developed  for  the  use  of  microelectronics  (VLSI,  WSI)  and  packaging  (MCM) 
technologies 

operational  efficiency,  which  is  inherent  in  its  architecture 
future  proofing,  through  r^ular  advances  in  microelectronics. 


3.3.2  The  Massively  ParaBel  Processor 

The  Massively  Parallel  Processor  (MPP)  array  is  a  string  of  identical  assodative  processing  elements  (APEs) 
as  indicated  in  Fig  3.3.  Each  APE  (as  shown  in  Fig.  3.4)  incoiporates  an  64-bit  data  register  and  an  6-bit 
activity  register,  an  70-bit  parallel  comparator.  Moreover,  an  APE  includes  a  single-bit  full-adder  and  four 
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status  flags;  the  arithmetic  cany(C),  match  and  destination  flags  (M  and  D)  and  the  activation  flag(A).  An 
APE  also  includes  control  logic  for  local  processmg  and  communication  with  other  APEs.  The  APE  can 
operate  in  three  different  data  modes.  The  64  bit  data  register  can  be  configured  for 

storage  and  bit  parallel  processing  of  two  32-bit  binary  words 

storage  and  bit-parallel  procesang  of  four  8-bit  tmiary  byte  fields 

storage  and  bit-serial  processing  of  one  to  three  ternary  contiguous  bit  fields  of  varying  length  (no  more 
than  64  bits  per  field) 


«  I 

Fig.  3.4  The  Associative  Processing  Element  (APE) 

Each  APE  connects  to  an  inter-APE  communication  network(IACN),  which  runs  in  parallel  with  the  APE 
substring  implemented  as  a  shift  register  and  a  chordal  ring.  It  can  be  dynamically  reconfigured,  thus 
providing  a  cost-eflfective  emulation  of  common  networic  topologies.  As  an  activity-passing,  rather  than  a 
data-passing,  network  it  minimises  data  transfers.  The  chordal  ring  enables  the  lACN  to  be  implemented  as  a 
hierarchy  of  ASP  substrings.  Thus,  communication  times  are  significantly  reduced  through  automatic 
bypassing  of  those  ASP  substrings  which  do  not  include  destination  APEs.  In  a  similar  way,  namely  through 
bypasang  of  feulty  ASP  substrings,  fault-tolerance  of  the  ASP  modules  is  guaranteed. 

All  APEs  share  a  common  bit-parallel  data  (called  the  scalar  data  bus  (SDB)),  activity  and  control  buses,  and 
one  feedback  line  called  the  Match  Reply(MR).  Through  the  Sequential  Data  and  Control  Interface,  an 
external  controller  (the  Instrunction  Stream  Manager)  maintains  the  buses,  feedback  line,  and  Link  Left  and 
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Link  Right  ports  of  the  inter-APE  communication  network.  The  link  ports  help  connects  the  APE  string  in  a 
chip  to  another  to  form  a  longer  string. 


Fig  3.5  The  ASP  string 


ASP  uses  content  addressing  rather  than  location  addressing  techniques.  Thus,  APEs  are  selected  for 
subsequent  parallel  processing  by  comparing  their  data  and  activity  contents  with  the  states  of  the 
corresponding  data  and  activity  buses.  In  operation,  the  ASP  supports  a  form  of  set  processing  in  which  the 
subset  of  active  APEs  (those  which  match  broadcast  data  and  activity  values),  support  scalar-vector  and 
vector-vector  operations.  The  ASP  either  directly  activates  matching  APEs  or  uses  resources  to  indirectly 
activate  other  APEs.  The  match  reply  line  indicates  whether  or  not  any  APEs  match.  The  controller  dther 
directly  broadcasts  scalar  data  or  recdves  it  via  the  bit  parallel  data  bus. 

Each  substring  may  be  partitioned  into  programmer  defined  segments,  separated  by  segment  links,  in  support 
of  structured  data  such  as  arrays,  trees,  tables,  gr^hs  etc.  The  segment  links  can  be  opened  or  closed  to 
prevent  or  allow  the  transmission  of  inter-APE  communication  signals  between  adjacent  s^jnents.  Thus, 
each  segment  comprises  a  span  of  contiguous  APE  blocks,  with  internal  block  links  closed  and  end  block 
links  converted  to  segment  links.  Users  can  aeate  variable  length  segments  by  writing  segment  links 
corresponding  to  the  M-tags  at  the  ends  of  APE  blocks.  Alternatively,  they  can  create  equal  length  segments, 
comprising  power-of-two  APEs,  with  a  special  command. 


323-002  (neo02.wp5/net)Issue  2,  June  28, 1995 

Copyright  Reserved,  Aspex  Microsystems  Ltd.,  Uxbridge,  England 


23 


PROPRIETY  INFORMATION 


Fig  3.6  Modular-MPC  ASP  Modules 


An  ASP  module  (Fig  3.6)  has  a  private  Parallel  Data  Interface  (PDI)  for  the  transfer  of  parallel  data.  The 
Modular-MPC  uses  a  hierarchy  of  parallel  data  pipelining  to  transfer  parallel  data  form  the  interfeces  to  the 
outside  Ci  e.  host  interface,  external  I/O  interfece  and  monitor  interface)  to  the  Parallel  Data  Interfeces  (PDIs) 
of  the  ASP  modules  and  vice  versa.  The  lowest  level  in  this  hierarchy  comprises  the  Parallel  Data  Queue 
(PDQ).  Data  transfers  between  the  PDQ  and  the  ASP  modules  are  called  Primary  Data  Tranters  (PDT). 
The  PDQ  is  implemented  with  an  orthogonal  data  queudng  mechanism.  Data  is  loaded,  overiapped  with 
parallel  processing,  word-sequentially,  bit  parallel.  It  can  subsequently  be  exchanged  with  the  APEs  in  a 
word-parallel  bit-sequential  manner.  Due  to  the  massive  bandwidth  of  this  exchange  (e.g.  with  a  bandwidth 
of  40  Mbits/sec  for  each  APE  a  Modular-MPC  comprising  641c  APEs  has  a  bandwidth  for  Primary  Data 
Transfers  (PDTs)  of  2.63  Tera  bits/sec),  the  exchange  time  during  which  parallel  processing  has  to  be 
stopped,  and  consequently  overheads,  can  be  reduced  to  a  minimum.Due  to  its  orthogonal  structure,  the 
PDQ  scales  linearly  with  the  number  of  APEs  in  the  string. 
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ASP  modules  provide  a  simple  means  for  scalability.  Perfoimance  can  be  adjusted  to  the  appKcation 
requirements  by  changing  the  number  of  APEs  per  ASP  module.  I/O  requirements,  which  manifest 
diemselves  in  the  required  number  of  data  channels,  are  met  by  changing  the  number  of  ASP  modules. 

3.3.3  The  Data  Stream  Manager 

The  DSM  contains  a  DSM  processor,  a  host  interface,  and  support  for  tertiary  data  transfers. 

The  DSM  processor  is  responsible  for  controlling  the  secondary  and  tertiary  data  transfers.  It  is  responsible 
for  initialising  the  corresponding  data  controllers  and  also  performs  sequential  operations.  It  appears  as  the 
bus  master  on  the  DSM  internal  bus,  "wdiich  allows  it  to  access  to  all  resources  on  the  bus.  The  DSM 
processor  has  hardware  support  for  ^chronising  the  instmction  stream  manago'  as  well  as  for  interrupting 
a  higher  level  controller  via  the  host  bus.  The  processor  itself  can  be  chosen  according  to  the  functionality 
required  at  this  level  e.g.  a  Sparc  processor  could  be  used  for  high  functionality-  high  performance 
^plications 

The  Data  Stream  Manager  (DSM)  is  connected  to  the  Massively  Parallel  Processor  (MPP)  via  a 
multi-channel,  large  bandwidth  parallel  data  bus. 

The  Data  Stream  Manager  (DSM)  consists  of  interfeces  for  parallel  data  I/O  and  the  Parallel  Data  Buffer 
(PDB).  The  PDB,  which  supports  image  patching  is  crucial  to  minimise  delays  caused  by  parallel  data  I/O. 
The  Parallel  Data  Buffer  (PDB)  provides  storage  for  at  least  one,  but  usually  several,  images,  similar  to  a 
frame-store.  Each  Parallel  Data  Buffer  (PDB)  is  connected  to  the  PDBs  of  other  Task  Execution  Units 
(TEUs)  via  the  Inter-Task  Parallel  Data  Router  (TTPDR). 

Parallel  processing  applications  vary  considerably  in  their  I/O  requirements.  In  order  to  achieve  a  high  level  of 
machine  versatility  the  Data  Stream  Manager  (DSM)  is  designed  as  a  highly  modular  unit,  which  can  adapt 
to  the  whole  spectrum  of  I/O  requirements.  In  particular,  the  DSM  is  designed  to  minimise  I/O  overheads 
through  a  successive  increase  in  input  bandwidth  between  the  stages  of  a  data  pipeline  and  patching 
overheads  by  providing  a  mechanism  for  high-bandwidth  patch  exchanges. 

Depending  on  the  I/O  requirements  of  a  particular  application,  different  data  pipelines  (see  Fig  3.7)  can  be 
implemented. 

1. 1 -stage  pipeline 
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In  this  configuration  of  the  DSM  the  parallel  data  interfeces  are  directly  connected  to  the  ASP  modules, 
parallel  data  is  only  buflFered  in  the  Primary  Data  Queue.  Consequently  any  image  storage  is  external. 
Two  levels  of  parallel  data  transfers  can  be  observed: 

Somndary  Data  Transfer  rSDTI  :  Data  is  transferred  between  the  external  interfaces  and  the 
Primary  Data  Queue  (PDQ).  This  transfer  is  comparatively  slow,  since  the  PDQs  of  all  ASP 
modules  have  to  share  a  single  connection  to  the  interfaces.  However,  Secondary  Data  Transfers 
(SDT)  can  be  overlapped  with  processing. 


Primary  Data  Tran.sfer  fPDTI :  This  is  the  word-parallel  transfer  between  the  Primary  Data  Queue 
(PDQ)  and  the  ASP  modules  It  is  a  non-ovetlapped  transfer  with  an  extremely  high  I/O  bandwidth. 


Fig  3. 7  DSM  configuration  options 


2.  2-stage  pipeline 

A  2-stage  pipeline  is  configured  by  including  the  Secondary  Data  Store  as  a  further  stage  in  the  parallel 
data  I/O  pipeline.  In  this  configuration,  patch  processing  is  supported  by  the  DSM.  The  Secondary  Data 
Store  (SDS)  implements  a  multi-fi'ame  store,  which  is  large  enough  to  store  at  least  one,  but  typically 
several  images.  By  storing  or  loading  of  sub-images  while  the  processing  of  the  current  sub-image  takes 
place,  the  ineflSdendes  caused  by  patching  are  minimised.  Frequent  exchanges  of  subimages  with  the 


323-002  (neo02.wp5/net)Issue  2,  June  28, 1995 

Copyright  Reserved,  Aspex  Microsystems  Ltd.,  Uxbridge,  England 


26 


PROPRIETY  INFORMATION 


MPP  are  made  possible  by  a  high  bandwidth  parallel  data  I/O.  Compared  to  the  1 -stage  pipeline  a  fiirther 
data  transfer  has  been  introduced; 

Tprtiaiy  Onta  Transfer  CTOTI  :  This  transfer  takes  place  between  the  external  interfaces  and  the 
Secondary  Data  Store  (SDS)  via  a  single  data  channel  for  image  transfer. 

Semndatv  Data  Transfer  fSPr) :  Other  than  for  the  1-stage  pipeline,  this  transfer  now  takes  place 
between  the  Secondary  Data  Store  (SDS)  and  the  Piimaiy  Data  Quaie  (PDQ).  It  implements  a  fast, 
multi-channel  patch  transfer  which  is  oveiiapped  with  processing,  thus  minimising  overheads.  The 
number  of  data  channels  for  the  Secondary  Data  Transfer  (SDT)  is  chosen  to  meet  the  application 
requiremeiTts. 

Primary  Data  Transftr  (PDTI  I  The  PDT  is  implemented  in  the  same  way  as  described  for  the 
1 -stage  pipeline. 

1  1-stage  pipeline 

With  the  introduction  of  a  Tertiary  Data  Queue  (TDQ),  the  DSM  can  be  configured  as  a  3-stage 
pipeline.  Depending  on  the  I/O  requirements  of  the  ^plication  and  typical  memories,  a  performance 
advantage  can  be  achieved  by  including  the  Tertiary  Data  Queue  (TDQ)  in  the  parallel  I/O  data  pipeline. 
Thus,  it  is  used  for  the  cost-effective  minimisation  of  I/O  overiieads  and  as  an  intermediate  storage  to 
buffer  data  between  the  interfeces  and  the  Secondary  Data  Store  (SDS).  Four  levels  of  parallel  data 
transfers  can  be  observed  for  the  3-stage  pipeline. 

Quaternary  Data  Transfer  tODTl  :  This  transfer  takes  place  between  the  Tertiary  Data  Queue 
(TDQ)  and  the  interfaces  via  a  ^gle  data  channel. 

Tertiary  Data  Transfer  /TOTl :  In  contrast  to  the  2-stage  pipeline,  for  this  configuration  the  TDT 
takes  place  between  the  Tertiary  Data  Quaie  (TDQ)  and  the  Secondary  Data  Store  (SDS).  The 
number  of  connections  fijr  this  transfer  can  be  configured  according  to  application  needs. 

Secondary  Data  Transfer  rSDTl  and  Primary  Data  Transfer  fPDTI  :  As  compared  to  the  2-stage 
pipeline,  these  transfers  remain  unchanged. 

The  Secondary  Data  Store  (SDS)  provides  storage  for  one  or  more  image  fi-ames  for  fast  patch  processing. 
Therefore,  it  needs  to  be  scalable  according  to  the  storage  requirements  of  the  application,  independently  of 
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the  number  of  Associative  Processing  Elements  (APEs)  and  independently  of  the  number  of  FO  data 
channels  for  between  the  SDS  and  the  MPP.  Fig  3.8  shows  the  modes  in  which  the  SDS  can  be  configured. 


Pinlltl  ProcMt  ExMUtlon  UnH  (PPEU) 


P»ril,IPme«MBi«cutlonUnHfPPEm 

Figure  3.8  shows  the  two  modes  SDS  configurations 


Distributed  Secondary  Data  Store  (SOS')  :This  mode  implements  static  routing  between  the  different 
blocks  of  the  SDS  memoiy  and  the  ASP  modules.  It  is  used  for  those  applications  where  ASP  modules 
access  data  only  in  assigned  memory  modules 

Shared  Secondary  Data  Store  tSDSI :  For  a  number  of  applications,  ASP  modules  may  access  data  in 
any  memoiy  module.  In  this  case  the  SDS  can  be  implemented  in  a  shared  mode  with  dynamic  routing 
between  memory  modules  and  ASP  modules.  This  dynamic  routing  is  implemented  with  the  Parallel 
Data  Router.  Consequently,  through  the  flexible  allocation  of  memory,  the  (plication  jkxibility  of  the 
Modular-MPC  is  increased  significantly.  A  programmer  sees  the  shared  SDS  as  a  single  (shared) 
memoiy. 

The  Secondary  Data  Store  memory  implemented  as  a  set  of  memory  modules.  Each  module  holds  a  fi'action 
of  the  data  kept  in  the  SDS  and  is  assigned  its  private  FO  channels.  The  storage  capacity  of  a  memory 
module  can  be  selected  fi’om  a  range  of  sizes.  The  SDS  memoiy  provides  two  modes  of  data  filtering, 
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implemented  by  the  byte  select  and  memory  module  inhibitor.  With  byte  select,  any  set  of  bytes  in  a  storage 
word  can  be  masked  for  write  access  such  that  only  part  of  the  memory  word  is  updated.  Byte  masking  is 
common  to  all  memory  modules.  Write  access  to  any  memory  module  can  be  inhibited.  The  inhibition  of  a 
memory  module  can  be  data-dependent,  i.e.  only  data  with  a  spedfic  signature  is  written  to  the  SDS.  The 
inhibition  of  one  memory  module  is  indqrendent  of  all  other  memory  modules. 


The  Parallel  Data  Router  (PDR)  implements  the  dynamic  routing  required  to  implement  a  shared  SDS. 
Therefore,  it  should  not  be  seen  as  a  general  purpose  network,  but  as  a  router.  The  PDR  implements  a 
CTOss-bar  topology,  i.e.  aity  ASP  module  can  exchange  data  with  any  memory  module.  The  number  of  data 
channels  in  the  PDR  can  be  adjusted  according  to  application  needs. 

Similar  to  the  Secondary  Data  Store  (SDS),  the  TDQ  (see  Fig  3.9)  is  implemented  as  a  set  of  memory 
modules.  As  for  the  SDS,  each  memory  module  holds  only  a  fraction  of  the  data,  has  its  private  I/O  channel 
and  its  capacity  can  be  selected  from  a  range  of  sizes. 


3.3.4  The  Instruction  Stream  Manager 

The  Instruction  Stream  Manager  (ISM)  consists  of  the  host  interface,  the  process  scheduler,  (which 
schedules  the  parallel  data  I/O  control  and  the  parallel  process  control)  and  a  Sequential  Process  Execution 
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Unit  (SPEU).  A  single  channel  control  connection  joins  the  parallel  process  control  with  the  MPP  (SIMD 
concept). 


Parallel  Process  Execution  Unit  (PPEU)  Parallel  Process  Execution  Unit  (PPEU)  TOO 


Fig  3.10  The  Instruction  Stream  Manager  (ISM) 


Figure  3.10  presents  an  overview  over  the  main  functional  blocks  in  the  Instruction  Steam  Manager  (ISM). 

The  process  scheduler  controls  and  ‘orchestrates’  all  other  units  in  the  ISM.  Through  remote  calls  it  starts 
parallel  processes  in  the  parallel  process  control,  sequential  processes  in  the  Sequential  Processes  Execution 
Unit  (SPEU)  and  it  initiates  data  transfers  controlled  by  the  parallel  data  I/O  control.  Furthermore,  the 
process  scheduler  uses  the  feedback  from  these  units  to  align  further  scheduling  of  processes.  As  the  main 
controlling  unit  in  the  ISM  it  is  the  only  fimctional  block  which  is  directly  connected  to  the  host  workstation 
via  the  host  inter&ce.  The  process  scheduler  is  asagned  a  private  code-store,  which  can  be  loaded  via  the 
host  interface. 

The  parallel  process  control  generates  the  control  stream  for  the  Masavely  Parallel  Processor  (MPP).  As  a 
cost-effective  way  to  minimise  control  overheads,  it  is  organised  as  a  control  hierarchy. 

The  micro  controller  calls  blocks  of  operations  vdiich  are  remotely  executed  in  the  nano  controller.  While 
the  nano  controller,  wliich  is  spedfically  designed  for  a  fast  repetition  of  blocks  of  operations,  executes  the 
remote  call,  enough  time  is  created  for  the  micro  controller  to  organise  the  "housekeeping"  (e.g.  branching) 
of  the  parallel  process  control  stream.  Furthermore,  the  nano  controller  provides  fimctionality  to  concatenate 
sequential  data,  which  is  stored  in  the  Sequential  Data  Buffer,  and  control  to  be  sent  as  a  single  instruction  to 
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the  MPP.  Both,  miCTO  controller  and  nano  controller  have  private  code  stores  in  which  the  required  task  and 
processes  software  library  modules  are  downloaded  via  the  host  interfece,  prior  to  processing. 

The  Sequential  Process  Execution  Unit  (SPEU)  is  dedicated  to  the  execution  of  sequential  processes.  Thus  it 
plays  a  crudal  role  in  minimising  oveiheads  due  to  operations  with  minimal  natural  parallelism.To  this  end, 
the  SPEU  includes  a  fest  floating-point  microprocessor  with  a  dedicated  code-store  for  sequential  process 
code  which  can  is  downloaded  via  the  host  interface. 

Furthermore,  the  SPEU  includes  a  data  store  dedicated  to  sequential  data,  called  Sequential  Data  Buffer 
(SDB).  The  SDB  stores  three  kinds  of  sequential  data  iirtermediate  data  used  as  sequential  data  in  the  MPP, 
sequential  data  being  evolved  by  the  miCToprocessor  of  the  SPEU  and  parameters  for  the  parallel  data  I/O 
control. 

Furthermore,  the  SDB  acts  as  a  bridge  to  move  parallel  Data  fi-om  the  Data  Stream  Manager  to  the 
Instruction  Stream  Manager  (ISM)  where  it  can  be  used  as  sequential  data  for  further  processing. 

The  parallel  data  I/O  control  governs  data  transfers  between  all  levels  in  the  Data  Stream  Manager  . 
Depending  on  the  number  of  stages  in  the  Parallel  Data  Store,  the  parallel  data  I/O  control  implements  a 
hierarchy  of  modules  (see  Figure  3.9); 

1 - stage  pipeline  only  the  Secondary  Data  Transfer  (SDT)  control  is  required 

2- stage  pipeline  SDT  control  and  TDT  control  have  to  be  implemented 

3- stage  pipeline  all  levels  of  control,  SDT  control,  TDT  control  and  QDT  control  are  required 

Figure  3.11  demonstrates  how  the  functional  units  in  the  parallel  data  I/O  control  can  be  partitioned  in  two 
sets: 

-coordinating  units  :  These  include  SDT  control,  TDT  control  and  QDT  control.  They  coordinate 
transfers  between  units  in  the  DSM  by  controlling  address  generating  units. 

-address  generating  units  :  Although  the  rather  large  number  of  address  generating  units  might  seem 
confusing  at  first  glance,  these  units  follow  a  simple  architectural  principle.  Each  functional  unit  in  the 
DSM  parallel  data  I/O  pipeline  is  assigned  a  private  controller  together  with,  if  necessary,  an  address 
generator  as  follows; 
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the  external  interfeces  are  controlled  by  the  interfece  control 

the  Tertiary  Data  Queue  is  controlled  by  the  TDQ  address  generator,  which  also  provides  the 
addresses  for  each  memory  mcxiule  in  the  TDQ 

each  memory  module  in  the  Secondary  Data  Store  (SDS)  is  controlled  and  provided  with  addresses 
by  a  local  address  generator 

the  Parallel  Data  Router  (PDR)  is  configured  and  controlled  by  the  PDR  configuration  unit 
the  Parallel  Data  Interfeces  (PDI)  are  controlled  by  the  PDI  control  and  the  ASP  channel  inhibitor 


Control  requirements 

1  stage  pipeline - 2  stage  pipeline  3  stage  pipeline 


Fig  3.11  FO  control 


The  Secondary  Data  Store  (SDS)  and  the  Tertiary  Data  Queue  (TDQ)  are  accessed  by  two  transfers,  SDT 
and  TDT,  and  TDT  and  QDT  respectively.  Therefore,  their  assigned  address  generators  are  able  to  switch 
between  two  different  contexts,  depending  on  the  transfer  which  is  executed  at  a  given  time. 

The  offset  generation  for  the  local  address  generators  is  done  in  a  similar  way  as  the  one  described  for  the 
TDQ  address  generation.  However,  additional  functionality  is  included  for  the  case  of  a  shared  SDS  when 
the  Parallel  Data  Router  is  included  in  the  SDS.  In  this  configuration,  for  Secondary  Data  Transfers,  offsets 
have  to  be  generated  for  each  local  address  generator,  the  parallel  data  router  has  to  be  configured  and  ASP 
channels  have  to  be  inhibited  in  cases  of  access  contention  to  certain  memory  modules  in  the  SDS.  All  this  is 
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handled  by  the  global  address  generator,  which  executes  these  tasks  overlapped  with  the  actual  address 
generation  for  SDT  in  the  local  address  generators.  Parameters  for  the  different  address  generators  can  also 
be  downloaded  from  the  Sequential  Data  Buffer  (SDB). 
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4 

NEO  detection  synthesis 


This  section  describes  the  synthesis  of  the  NEO  detection  application  using  a  modular  massively  parallel 
computer  currently  being  developed  at  Aspex  Microsystem  Ltd.  The  NEO  detection  implementation  has 
been  studied  at  two  levels  of  abstraction  namely,  the  task  and  the  program  level  to  inaease  overall 
detection  performance  and  efficiency. 

4. 1  Synthesis  approach 

The  approach  adopted  to  inclement  the  NEO  detection  program  aims  to  maximise  overall  PE  utilisation 
and  efficiency^  at  both  the  task  level  (  pkel :  PE  ratio)  and  at  the  program  level  (effective  PE  usage  during 
processing  task  secjuences).  At  the  task  level,  PE  utilisation  is  dependent  on  the  pixel :  PE  ratio  utilised 
(or  the  size  of  the  virtual  PE)  for  a  sequence  of  tasks  without  recjuiring  any  I/O.  While  maximum  task 
efficiency  is  ensured  by  reducing  the  overall  overheads,  especially  due  to  communications.  Most  often, 
the  choice  of  a  particular  task  implementation  is  usually  a  compromise  between  PE  utilisation  and  task 
level  efficiency.  Although  a  task  might  appear  to  be  running  efficiently,  PE  utilisation  may  be  low  and 
would  have  a  bearing  on  the  overall  program  efficiency.  Effective  PE  count  would  reduce  and  hence 
active  patch  azes  prcoessed  by  the  array  reduces. 

Once  tasks  have  been  developed,  the  sequencing  and  partitioning  of  tasks  is  considered  for  overall 
program  synthesis.  During  the  course  of  the  processing,  data  transforms  from  its  iconic  2D  data  stmcture 
through  to  a  collection  of  object-based  iconic  structures  and  finally  into  a  ^mbolic  description  for  each 
object,  as  illustrated  in  Fig.  4. 1 . 


IMAGE  WITH  2  ASTRONOMICAL  OBIECTS 


Fig.  4.1  Evolution  of  data  during  processing 
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The  implementation  of  the  NEO  detection  program  is  based  on  the  exploitation  of  the  massive  data 
volume  reduction  ensuing  low  and  intermediate  level  image  processing  operation,  typical  in  astronomical 
image  processing.  Also,  exploitation  of  these  evolving  data  stmctures  can  reduce  communication 
overheads  incurred  during  task  processing. 

At  the  program  level,  PE  utilisation  is  seen  as  the  ratio  of  active  PEs  to  the  total  number  PEs.  If  the 
implementation  does  not  match  evolving  data  stmctures,  the  under-utilisation  of  available  resources 
results . 

Given  this  evolving  data  stmctures  and  the  massive  data  reductions  (95-100%),  the  tasks  have  been 
partitioned  into  two  main  group.  Those  tasks  responsible  for  reducing  images  to  objects  generally  have 
to  be  able  to  keep  up  with  the  data  rate.  These  tasks  namely  image  restoration,  object  enhancement  and 
segmentation,  are  involved  in  time-dependent  processing  (TDP).  Exploiting  this  reduction  from 
pkel-based  to  object-based,  the  subsequent  tasks  namely  the  analysis  and  dedsion  tasks  are 
object-dependent  (ODP).  Additionally,  analysis  is  done  on  object  stamps  and  are  hence  iconic-based 
while  decision  involves  symbolic  processing.  This  partition  can  be  seen  in  Fig  4.2 


Fig  4.2  Task  partitioning :  time-dependent  and  object-dependent  processing 

4.2  NEO  appHcation-fpecific  synthesis 

The  NEO  detection  Emulation  is  based  on  the  GODS^  program  [SAV94]  developed  at  O.C.A 
However,  as  the  decision  stage  is  the  least  complex  and  computation  intenave  of  all  the  taidcs,  the 
consequent  modelling  of  ^stem  performance,  based  on  the  Modular-MPC  i^stem,  was  performed  on 
the  more  intensive  TDP  and  ODP  iconic-based  stages. 

4.2.1  TDP  implementation 

Raw  images  acquired  from  the  CCDs  are  loaded  into  PEs  with  1  pkel/PE.  Each  pixel  is  16  bits  in  length. 
The  TDP  program  from  is  shown  in  Fig.4.3. 


*GODS  :  Global  Orbit  Determination  System 
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TDP  tasks  include  the  "Mexican  Hat"  filtering,  thresholding  and  maxima  detection.  The  structure  of  the 
convolution  mask  that  represents  the  "Mexican  Hat "  filter  is  octa-symmetric  as  seen  in  Fig  4.4,  Typically 
performance  inaeases  may  be  achieved  by  exploiting  this  ^mimetry  to  reduce  overall  number  of 

multiplication  fi'om  nxnio  approximately  nxn  DIV  8.  Also  ^ven  the 
nature  of  the  gaussian-based  filter,  separable  filters  may  also  be  employed. 
Separable  filters  reduces  overall  computations  fi’om  nxnio  about  4n. 

However,  studies  of  several  convolution  techniques  [JOS94b]  for  a 
string-connected  architecture  has  revealed  that  minimisation  of 
multiplicalion  and  the  use  of  separable  filters  do  not  necessarily  result  in  an 
increased  performance.  The  most  suitable  technique  is  a  compromise 
between  the  computation,  intermediate  result  storage  and  the 
communication  required  between  PEs.  This  is  due  to  the  fact  that  PE 
utilisation  decreases  with  the  need  for  more  intermediate  storage  facilities, 
resulting  in  larger  communication  overheads  due  to  increase  virtual  PE  sizes 
and  hence  reducing  task  eflBciency. 

The  implementation  techniques  adopted  is  based  on  a  method  that  exploits 

Fig  4.3  Flew  diagram  for  Time  ^  •  i  j-  r  i  j 

Dependent  Processing  the  string-coiinected  architecture  of  the  ASP.  This  method  is  accomplished 

in  two  major  steps  to  complete  the  multiply  accumulation  on  either  halves  of 
the  mask.  Raw  data  is  first  copied  into  fiee  memory  within  each  PE.  This 
establishes  an  image  copy  that  can  be  moved  by  shifts  equal  to  the  distance  of  the  next  involved  PE  along 
the  string.  Immediate  ndghbours,  east  or  west,  depending  on  which  step  is  presently  done,  are  first 
communicated  for  convolution.  As  this  is  moved,  pixels  following  it  ride  on  the  transfer  after  which  it  is 
stored  again.  Hence  this  method  is  called  piggy-backing.  The  total  communication  is  directly  related  to 
only  the  diameter  of  the  convolution  mask  on  the  string.  The  result  of  the  multiply  and  accumulate  is 
stored  in  the  result  space.  Hence  no  additional  memory  is  required  for  intermediate  results. 
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Fig.  4.4  Symmetneily  of  the  ''Mexican  Hat"  Filter 
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Thresholding  is  achieved  by  a  scalar-vector  comparison.  Pixels  found  below  the  threshold  are  set  to  zero. 
After  thresholding,  objects  are  defined  by  their  point  of  maxima.  Maxima  points  are  determined  by 
comparison  with  their  4-way  connected  nearest  neighbours.  Maxima  detection  begins  by  considering  all 
pkels  belonging  to  a  set  of  possible  objects.  As  the  algorithm  proceeds,  a  maxima  set  is  built  by  rqecting 
all  objects  tiiat  have  at  least  one  ndghbour  greater  then  itself 

Before  objects  can  be  read-out,  x-y  co-ordinates  are  generated  per  patch.  Instead  of  using  a  log  n 
method  for  this  an  algorithm  that  uses  an  external  address-based  look  up  table  is  used.  Using  the  ASP's 
activation  modes,  only  patch  width  and  patch  hdght  reads  and  writes  are  necessary. 


Fig  4.5  shows  the  utilisation  of  the  PE  during  the  processing. 


DATA  REGISTER 


0 


ACTIVrrY 
REGISTER 
64  al  at 


MAXIMUM  MARK^ 
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Fig  4.5  Memory  utilisation  per  APE  during  TDP 

At  this  point  in  processing,  the  overall  2-D  image  structure  has  been  reduced  to  a  collection  of  points 
representing  astronomical  objects.  Before  objects  may  be  read  out,  information  r^arding  their  location  in 
terms  of  pkel  co-ordinates  have  to  be  generated  and  associated  with  them.  Pkels  are  now  32  bits  long 
for  at  least  3  decimal  place  accuracy  while  co-ordinate  length  is  dependent  on  the  patch  size.  The  (x,y) 
pkel  co-ordinates  of  the  maxima  points  are  read  out  through  scalar  reads  via  the  Sequential  Data 
Interfece.  Since  the  number  of  objects  expected  to  be  detected  per  patch,  it  may  be  more  eflicient  to 
attempt  scalar  reads  rather  than  transfer  co-ordinates  through  the  parallel  data  buffer  (PDB).  The 
processed  patch  is  also  read  out  through  the  PDB. 


323-002  (neo02.wp5/net)Issue  2,  June  28, 1995 

Copyright  Reserved,  Aspex  Microsystems  Ltd.,  Uxbridge,  England 


37 


PROPRIETY  INFORMATION 


4.2.2  ODP  Implementation 


The  ODP  program  is  shown  in  Fig  4.6.  For  the  analysis,  stamps 
extracted  from  the  processed  image  around  detected  objects  centred  on 
maxima  co-ordinates,  are  loaded  into  the  array.  Each  stamp  occupies 
(mw+5)  PEs  as  shown  in  fig  4.7.  The  5  PE  cell  pack  consist  of  one 
intermediate  accumulator  cell.  Moments  X  and  Y  cells  and  the  Centroid 
X  and  Y  cells.  This  cell  pack  neighbours  the  central  PE  of  the  stamp. 
This  is  in  order  that  communication  oveiheads  can  be  minimised.  This 
scheme  provides  sufficient  memory  for  any  intermediate  results  and  also 
the  huge  results  of  the  moments.  To  avoid  dynamic  segmentations, 
processing  is  tightly  coupled  to  active  sets  only,  thus  preventing 
unnecessary  writes  of  intermediate  results  in  inactive  cells.  Also,  by 
allocating  necessary  memory  for  each  result,  excesave  overheads  due  to 
clearing  PEs  to  make  space  for  centroid  calculations  can  be  avoided 


Fig.  4.6  Flow  diagnm  for  Object 

Dependent  Processing  The  analysis  b^ins  with  the  moments  calculation.  The  moments  mask  is 

bi-symmetric  as  seen  in  Fig  4.8.  The  technique  employed  reduces 
multiplication  by  accumulating  all  pixels  having  the  same  multiplier.  Again  by  u^g  the  piggy-back 
method,  communication  overheads  can  be  reduced  directly.  This  method  is  usefirl  since  multiplication 
involves  45  bit  long  results.  This  is  again  so  that  precision  of  3  decimal  places  are  maintained. 


DATA 


INTERMEDIATE  RESULT  CELL 


Fig  4. 7  Mapping  moments  on  to  a  string 


Centroid  calculations  involves  only  the  determination  of  the  numerator  and  denominator  values  (a  and  b) 
so  that  division  can  be  accomplished  outside  the  ASP.  External  division  calculations  are  less  costly. 
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Besides  even  for  the  dedsion  stage,  centroid  values  are  only  recpjired  for  astronomical  co-ordinate 
reduction  rather  than  the  detection  of  asteroid  motion. 

Only  the  results  of  the  moments  and  centroids  are  read-out.  These  are  used  to  build  a  list  of  records  of 
detected  objects.  List  corresponding  to  the  different  frames  of  the  same  field  are  then  used  for  the 
decision  process.  Given  the  huge  data  reduction  available  from  the  TDP  stage  and  the  sparse,  random 
nature  of  astronomical  object  distribution,  there  would  be  periods  when  ODP  is  not  run  due  to  patches  in 
the  TDP  without  any  object  and/or  there  is  insufficient  data  to  fill  the  array.  During  these  times,  dedsion 
tasks  may  be  executed. 
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Fig  4.8  Svmmetrv  in  moments  masks  a)  X  Moments  and  b)  Y  Moments 

4.2.3  NEO  detection  program  implementation 

The  TDP  stage  accepts  raw  data  from  the  CCD  controllers  via  the  external  FO  interface.  CCDs  are  read 
line  be  line  and  all  CCDs  are  read  simultaneously,  for  example  a  9  2k  x  2k  CCD  column  would  give  a 
burst  of  2k  X  9  pixels  per  read-out.  Incoming  CCD  data  is  doubled  buffered.  Patches  are  extracted  from 
the  buffer  and  are  processed  Note  the  a  drcular  buffering  scheme  should  be  employed  to  maintain 
scan-continuily. 

Once  processed,  the  TDP  passes  the  processed  patches,  instead  of  object  stamps,  and  the  list  of  detected 
objects  to  the  second  stage.  The  choice  of  passing  processing  patches  over  object  stamps  from  the  TDP 
to  the  ODP  stage,  is  so  as  to  maintain  some  level  of  determinacy  in  the  data  size  and  hence  the 
communication  time.  Object  stamps  introduces  data  duplication  between  stamps,  as  star  regions  may 
overiap.  Hence,  if  the  patch  contains  a  star  cluster,  many  objects  may  discovered  resulting  in  "data 
exploaon".  The  object  co-ordinates  obtained  from  the  TDP  are  patch  co-ordinates.  These  co-ordinates 
are  converted  to  absolute  co-ordinates  before  being  passed  to  the  ODP.  The  overall  program 
reqiurements  are  shown  in  the  data  and  control  flow  diagram  in  Fig  4.9. 
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TDP 

ODP 

-  Data  Flow 
Control  Flow 


Fig  4,9  Data  flow  and  control flow  diagram  for  the  NEO  detection  program 


4.2.4  Modular-MPC  system  implementation  for  NEO  detection 

Two  ^preaches  have  been  studied  for  the  overall  system;  a  2-stage  and  a  1-stage  processing  pipeline 
seen  in  the  overall  functional  block  diagram  in  Fig.  4. 10.  Both  approaches  are  effectively  data  driven,  i.e. 
the  TDP  begins  processing  once  the  buffer  is  filled  while  the  ODP  be^  when  enough  object  stamps 
have  been  accumulated. 

In  the  2-stage  pipeline  where  the  TDP  and  the  ODP  stages  are  spatially-partitioned  (Fig  4.10a)  and 
operate  asynchronously.  The  1-stage  processing  pipeline  (Fig.  4.10b)  multiplexes  between  the  TDP  and 
the  ODP  stages  i.e.  temporally  partitioned  i.e.  they  operate  on  the  same  stage.  With  this  approach,  TDP 
is  repeated  until  enough  object  stamps  are  accumulated  before  the  ODP.  The  number  of  repetitions 
necessary  is  dependent  on  the  expected  object  detection  rate  and  hence  the  viewing  depth  for  which  the 
imaging  unit  has  been  configured  for. 
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b.  1 -stage  processing  pipeline 


Fig  4.10  NEO  detection  system  with  TOP  and  ODP  processing  stages 
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5 

NEO  Simulation  &  evaluation 

The  amulation  for  the  NEO  detection  program  was  done  on  the  ASP  System  Testbed  for  Research  and 
^plication  (ASTRA).  The  results  of  this  simulation  could  then  be  applied  into  the  appropriate  system 
model  with  the  task  partition  mentioned  in  the  previous  section.  The  ASTRA  ^stem  was  developed  by 
Aspex  Mcro^stems  Ltd,  to  develop  algorithms  and  assess  the  ASP  in  a  wide  range  of  off-line  signal  and 
data  processing  applications. 

ASTRA  can  be  represented  functionally  as  comprising  four  blocks  ;  the  high  level  controller, 
intermediate-level  controller,  the  low-level  controller  and  the  procesang  array,  as  seen  in  Fig  5.1.  It.  is 
based  on  the  traditional  64  PE  VLSI  implementation  of  the  ASP  string.  The  testbed  can  accomodate 
upto  8K  processors,  with  2K  processors  on  a  9U  hyper-extended  Eurocards.  The  host  for  the  ASTRA  is 
a  Sun  workstation  and  provide  overall  program  control,  external  data  storage  and  user  interfece  fadties. 
A  intermediate  controller  based  on  a  Motorola  68030  handles  ASP  procedure  sequencing  and  also 
handles  the  array  EO.The  array  is  controlled  by  a  AMD  29330  micorsequencer  and  microprogram 
storage.  A  second  generation  ASTRA  is  currently  being  developed  that  could  utilise  ether  the  64-bit 
VLSI  or  the  hybrid-WSI  implementations.  Each  board  would  have  about  8K  processors  using  the  hybrid 
WSI  chips  and  the  host  would  be  a  Sunsparc  station. 


(Host) 


Fig  5. 1  The  J^TRA  system 


The  main  objective  of  the  simulation  were  to  obtain  a  realistic  assessment  of  the  ASP-based  tasks  within 
the  system's  hardware  and  software  contraints.  Image  restoration  and  enhancement,  segmentation  and 
analysis  algorithms  were  written  and  developed  on  ASTRA  As  only  minimal  instruction  changes  would 
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exist  between  the  VLSI  ASP  implementation  and  later  generation  ASP  chip,  algorithms  and  hence,  total 
algorithm  slot  time  would  remain  amilar. 

The  program  implementation  of  the  pixel-based  and  iconic  based  tasks  was  sequential  in  nature  since 
only  a  single  instruction  thread  is  possible  on  the  ASTRA  With  the  fiinctionality  provided  by  the  ASP 
compiler,  list  files  maybe  generated  revealing  the  macro-to-micro  instruction  break  down.  This  list  is 
usefiil  in  determining  overall  time  slots  required  per  array  procedure. 

The  Table  5.1  below  ^ves  the  execution  times  of  the  different  TDP  and  ODP  iconic-based  tasks  for 
varying  segment  widths  and  operator  sizes. 


,  4  /Object  SIsfse :  5 

Object  Ske  Its 

’’Mexican  Hat”  Filter* 

1,178  - 1,357 

3,578  -  4,205 

MlllHIMlhold 

_ _ 

Maxima  Detection* 

57  -  236 

59  -  238 

Moments  X 

545 

1,679 

545 

1,679 

77 

85 

‘Segment  width  from  32-256  PEs 


Table  5. 1  Timings  for  the  TDP  and  ODP  tasks 

The  throughput  for  the  TDP  and  ODP  iconic-based  stages  are  given  below  in  Table  5.2.  for  an  object 
size  of  5-15  pkels  (equivalent  to  operator  size).  The  tasks  were  modelled  to  reflect  their  relation  to  the 
patch  width,  the  number  of  processing  elements,  the  size  of  the  operators,  the  number  of  CCDs  and 
other  system  variations  such  as  the  number  of  I/O  channels. 


M^widthits 

4K 

2.94 

1.03 

116,894 

5,174 

5.89 

2.06 

233,790 

10,348 

11.78 

4.11 

467,680 

20,696 

23.56 

8.22 

935,160 

41,392 

47.11 

16.45 

1,870,320 

82,783 

Table  5. 2  Throughput  for  the  TDP  and  ODP  stages 
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The  large  variation  in  timings  for  the  TDP  stage  for  increasing  PEs  is  due  to,  not  only  the  coordinate 
generation,  but  also  mainly  because  of  the  scalar  read  required  for  object  coordinates  and  is  hence 
dependent  on  the  number  of  objects  detected.  The  throughput  of  the  ODP  could  be  increased  even  more 
by  compromising  communication  overheads  with  a  more  compactly  packed  object  stamp  on  the  ASP 
string. 

With  the  above  figures  for  the  different  tasks,,  ^propriate  system  models  were  then  designed  to  evaluate 
the  detection  program  on  the  Modular-MPC.  Factors  taken  into  conaderation  in  the  models  are  mainly 
related  to  processing  and  data  movement.  Table  5.3  and  Table  5.4  gives  possible  configurations  for  a  1, 
2  or  3  CCD  column  (using  Kodak  2k  x  2k  CCDs)  system  at  varying  acquisition  rates  (sidereal  rates)  and 
smallpst  and  largest  expected  object  sizes  for  the  1  stage  and  2  stage  pipeline  implementations.  The  larger 
configurations  use  the  multi-channel  input  to  reduce  the  I/O  overheads.  Thus,  from  the  table  below,  a 
16K  system  would  produce  14  objects  after  the  TDP  stage.  And  if  the  ^stem  had  16  Kodak  2K  x  2K 
CCDs  in  a  single  column  then  it  would  be  able  to  objects  of  size  less  than  or  equal  to  15  and  5  at  sidereal 
rates  of  less  than  or  equal  to  3  and  10  respectively. 


TI 

; . 

.  C0i 

iinuis 

ODP 

)P 

^^3 

IEm 

(15,3) 

(5,10) 

(15,1) 

(5,6) 

(15,1) 

(5,4) 

4K 

K9 

(15,6) 

(9,10) 

(15,3) 

(5,9) 

(15,2) 

(5,6) 

5K 

MK 

(15,8) 

(11,10) 

(15,4) 

(5,10) 

(15,2) 

(5,7) 

7K 

40K 

3d 

(15,10) 

(15,10) 

(15,5) 

(7,10) 

(15,3) 

(5,7) 

8K 

3? 

(15,10) 

(15,10) 

(15,10) 

(15,10) 

(15,6) 

(15,7) 

15K 

Table  5.3  Performance  of various  PE  configurations for  a  given  object  sizefO)  and  sidereal  rate  represented  as  (0,S).  The 
above  representations  shows  the  maximum  object  size  (O^  )  possible  with  the  minimum  Cereal  rate  (S^  )  and  the 
minimum  object  sizefO^)  and  the  maximum  sidereal  rate  (S^  for  a  2-stage  pipeline. 


The  number  of  PEs  for  the  object-dependent  processing  (ODP)  stage  was  determined  by  the  average 
number  of  objects  expected.  The  Spaceguard  Report  had  recommended  an  approximate  object  detection 
density  of  30,000  objects/sq.deg.  Thus  given  the  machine  aze,  the  CCD  angular  resolution  and  this 
dnesity  the  average  number  of  objects  per  patch  is  calculated.  Based  on  this  object  rate  and  the  number 
of  PEs  required  to  hold  an  object,  the  ODP  size  can  be  determined.  The  number  of  objects  detected  per 
patch  is  random  in  nature  but  for  the  purpose  of  the  simulation  this  approximation  would  suffice. 
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,  -r  ’  1 

Number  of  PEs 

(15,3) 

(5,10) 

(15,1) 

(5,5) 

(15,1) 

(5,3) 

(15,6) 

(9,10) 

(15,3) 

(5,9) 

(15,2) 

(5,6) 

Vfc-27:  T 

(15,8) 

(11,10) 

(15,4) 

(5,10) 

(15,2) 

(5,8) 

:  '  '  «>K 

(15,10) 

(15,10) 

(15,5) 

(7,10) 

(15,3) 

(5,10) 

’-4^  — 

(15,10) 

(15,10) 

(15,6) 

(9,10) 

(15,4) 

(5,10) 

'-"1  u 

(15,10) 

(15,10) 

(15,10) 

(15,10) 

(15,7) 

(9,10) 

mK 

m 

(15,10) 

(15,10) 

(15,10) 

(15,10) 

(15,10) 

(15,10) 

Table  5.4  Petformance  of  various  PE  configurations  for  a  given  object  sizefO)  and  sidereal  rate  (S)  represented  as 
(O,^.  The  above  representations  shows  the  maximum  object  size  (O^  possible  with  the  minimum  sidereal  rate  (S^ 


and  the  minimum  object  size(0^  and  the  maximum  ridereal  rate  (S^for  a  l-stage  pipeline 

It  can  be  seen  that  the  1-stage  pipelined  ^stem  is  comparable  if  not  better  than  the  2-stage  pipelined 
^stem  for  a  similar  count  of  PEs.  This  can  be  attributed  to  the  feet  that  in  the  larger  effective  array  for 
the  1 -stage  system  reduces  considerably  the  overheads  due  to  necessary  overlaps  during  patching. 
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6 

Results  and  Conclusion 


The  Spaceguard  report  has  provided  a  benchmark  from  which  detection  programs  can  build 
on.  Covering  in  excess  of  6000  to  10000  sq.  deg./month  and  at  a  limiting  magnitude  of  around 
24,  discovery  completeness  could  be  maintained  with  an  estimated  200-300  times  the  increase 
in  computational  requirements  offered  at  Spacewatch.  However,  astronomers  will  continue  to 
drive  their  instruments  to  cover  larger  areas  using  new  techniques  and  devices  such  as 
optically  multiplexed  arrays[CRA94]  to  remove  inter-CCD  dead  space,  charge  injection 
devices  allowing  non-destructive  individual  pixel  read-out[NIN94],  photon  intensifiers  to 
improve  signal-to-noise  ratios[FOR94]  etc.,  to  increase  their  individual  observational 
efficiencies[LES94].  And  as  sensors  become  increasingly  advanced  and  algorithms  become 
more  complex,  computation  requirements  would  indeed  soar. 

The  approach  that  was  adopted  in  this  initial  study  for  the  feasibility  of  ASP  technology  to 
NEO  detection  has  been  straight  forward.  Two  processing  partitions  have  been  considered 
namely  a  Time  Dependent  Processing  stage  that  reduces  images  to  extract  objects  and  an 
Object  Dependent  Processing  stage  that  handles  all  processing  related  to  objects.  By  matching 
implementation  to  such  a  task  sequence  partition,  overall  solution  performance  is  enhanced 
and  effective  tuning  to  specific  system  changes  and  upgrades  is  possible.  The  TDP  stage  can 
then  be  scaled  to  meet  changes  in  image  size  (e.g.  through  larger  CCD  images  in  size  and 
number),  changes  in  the  processing  time  window  (e.g.  decreasing  acquisition  time  using  more 
advanced  sensors)  and  increased  TDP  task  complexity.  The  ODP,  on  the  other  hand  would  be 
dependent  on  the  number  of  objects  that  are  expected  per  square  degree  and  the  resulting 
information  extraction  requirements. 

It  was  seen  in  the  previous  chapter  that  the  1 -stage  pipelined  implementation  proved  to  require 
as  many  number  of  processing  elements  as  the  2-stage.  Having  only  1  controller,  in  the  1 -stage 
implementation,  it  proves  to  be  more  cost-effective  than  the  2-stage  implementation.  The 
performance  obtained  by  various  sizes  of  the  machine  with  regard  to  the  number  of  CCDs  and 
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the  acquisition  time  per  CCD  is  given  in  Fig  6.1.  A  more  realistic  implementation,  where  in 
most  cases,  the  object  size  would  be  about  9  pixels.  Fig  6.2  demonstrates  the  resulting 
performance. 


Fig  6.1.  Performance  of  various  configurations  of  the 
Modular-MPC  with  regard  to  the  number  of  CCDs  and  the 
acqusition  time  for  objects  of  size  15  pixels. 


Fig  6.2.  Performance  of  various  configurations  of  the 
Modular-MPC  with  regard  to  the  number  of  CCDs  and  the 
acqusition  time  for  objects  of  size  9  pixels. 

Thus,  depending  on  application  requirments  i.e.  CCD  read-out  rate,  maximum  object  size  and 
the  number  of  objects  per  sq.  deg.),  the  appropriate  size  for  the  system  can  be  chosen.  At 
O.C.A.,  the  30  cm  focal  plane  of  the  Schmidt  telescope  can  house  up  to  16  Kodak  2K  x  2K 
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CCDs  in  a  staggered  arrangement  per  column.  At  sidereal  rate,  each  CCD  can  be  read  out 
completely  in  79.8  sec  but  between  6  -10  times  sidereal  rate  is  expected  and  with  object  sizes 
of  9  pixels,  a  32k  PE  system  with  32  MBytes  memory  would  suffice.  At  lower  sidereal  rates, 
larger  object  sizes  can  be  operated  on  while  at  faster  sidereal  rates,  smaller  objects  size  can 
only  be  handled. 

Current  research  in  Aspex  aims  to  develop  16k  PEs  per  board  and  16  MBytes  of  memory  on 
each  DSM-SDS  board,  and  DSM-PDR  boards  stacked  on  an  ISM/DSM  mother  board.  A  full 
SIMD  configuration  would  comprise  4  MPPs,  4  DSM-SDS  and  2  DSM-PDR  boards  on  the 
ISM/DSM  board  providing  a  64K  PE  system  with  64MByes  of  memory.  Boards  would  be  of 
standard  6U  sizes  and  this  configuration  would  comfortably  fit  into  an  industrial  workstation 
e.g.  HP9000  748i. 

Table  5.1  summarises  system  board  count  necessary  for  the  NEO  detection  system. 


Board  Type 

Number  of  Boards 

Comments 

ISM/DSM 

1 

MPP 

2 

16k  PEs/board  using  4  x  4K  PE  MCM-Ds 

DSM-SDS 

2 

16  MBytes/board  with  upto  16  data  channels 

DSM-PDR 

0 

Table  5 A  Modular-MPC  configuration  for  NEO  detection 


Planned  upgrades  for  next  generation  ASP  modules  target  factor-of-four  increases  in  the 
number  of  PEs  through  stacking  and  3D  techniques.  With  the  first  technology  upgrade,  the 
number  of  MPP  boards  would  reduce  to  1.  Advances  in  high  density  memory  packaging 
technology  should  also  reduce  the  number  of  DSM-SDS  boards  to  1  and  overall  board 
configuration  to  1  MPP,  1  DSM-SDS  and  1  ISM/DSM  boards.  Further  research  into  3D 
MCM  and  wafer-scale  implementation  technology,  would  ultimately  integrate  the  MPP,  the 
PDR  and  the  SDS  into  a  single  package,  resulting  in  a  single  board  system  for  real-time  NEO 
detection. 

Thus,  it  can  be  conclusively  stated  that  the  Modular-MPC  approach  to  NEO  detection 
problem  can  provide  the  necessary  performance  and  cost-effectiveness. 
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